*** descriptive statistics
sum averagemark
sum averagemark if dataround==0
sum averagemark if dataround==1
sum averagemark if dataround==2 & gpt35==0
sum averagemark if dataround==2 & gpt35==1

sum wordcount
sum wordcount if dataround==0
sum wordcount if dataround==1
sum wordcount if dataround==2 & gpt35==0
sum wordcount if dataround==2 & gpt35==1


***density plot word count
twoway kdensity wordcount if dataround ==0 & gpt35==0, color(blue) || kdensity wordcount if dataround ==1 & gpt35==0, color(red) || kdensity wordcount if dataround ==2 & gpt35==0, color(green) || kdensity wordcount if dataround ==2 & gpt35==1, color(yellow)

***box plots marks by markers and bloom
graph box marker1 marker2, over(bloomlevel)


***Testing for heteroscedasticity, skewness, and kurtosis 
regress averagemark i.bloomlevel i.dataround i.gpt35
estat szroeter, rhs mtest(holm) 
**no heteroscedasticity, although the p-value for the GPT3.5 var (0.079) comes close to 0.05
estat imtest
** sign of heteroscedasticity, although with 0.044 close to the 0.05 cut-off; still using Huber/White/sandwich VCE provided by Stata's vce(robust)  

***Ramsey (1969) RESET test (REgression Specification-Error Test)
regress averagemark i.bloomlevel i.dataround i.gpt35
estat ovtest 
** no problem (p= 0.895)


*** main analysis
regress averagemark i.bloomlevel, vce(robust)
regress averagemark i.bloomlevel i.dataround i.gpt35, vce(robust)
margins bloomlevel#dataround if gpt35==0
marginsplot

*** Dfbeta assessment
regress averagemark i.bloomlevel i.dataround i.gpt35
dfbeta, stub(beta) 
*using the 2/SQR(N) cut-off point (0.182574186)
scatter beta1 beta2 beta3 beta4 beta5 beta6 beta7 obsindex, ylabel(-1(.5)3) yline(.183 -.183) mlabel(obsindex obsindex obsindex obsindex obsindex obsindex obsindex )
**removing observations whose dfbeta falls outside the threshold for more than one predictor
regress averagemark i.bloomlevel i.dataround i.gpt35 if obsindex !=4 & obsindex !=11 & obsindex !=13 & obsindex !=14 & obsindex !=17 & obsindex !=33 & obsindex !=35 & obsindex !=53 & obsindex !=69 & obsindex !=79 & obsindex !=88 & obsindex !=93 & obsindex !=102 & obsindex!=103 & obsindex !=109 & obsindex !=110 & obsindex !=111 & obsindex !=112 & obsindex !=114 & obsindex !=115 & obsindex !=116 &obsindex !=119, vce(robust)




***robustness check with reconciled marks***
***Testing for heteroscedasticity, skewness, and kurtosis 
regress recconciled i.bloomlevel i.dataround i.gpt35
estat szroeter, rhs mtest(holm) 
**sign of heteroscedasticity for GPT3.5 (though with p=0.0429 close to 0.05 cut-off), using Huber/White/sandwich VCE provided by Stata's vce(robust) 

***Ramsey (1969) RESET test (REgression Specification-Error Test)
regress recconciled i.bloomlevel i.dataround i.gpt35
estat ovtest 
** no problem (p= 0.701)

*** reproduced main analysis
regress recconciled i.bloomlevel, vce(robust)
regress recconciled i.bloomlevel i.dataround i.gpt35, vce(robust)

*** Dfbeta assessment
regress recconciled i.bloomlevel i.dataround i.gpt35
dfbeta, stub(beta2) 
*using the 2/SQR(N) cut-off point (0.182574186)
scatter beta21 beta22 beta23 beta24 beta25 beta26 beta27 obsindex, ylabel(-1(.5)3) yline(.183 -.183) mlabel(obsindex obsindex obsindex obsindex obsindex obsindex obsindex )
regress recconciled i.bloomlevel i.dataround i.gpt35 if obsindex !=4 & obsindex !=11 & obsindex !=13 & obsindex !=14 & obsindex !=15 & obsindex !=17 & obsindex !=33 & obsindex !=35 & obsindex !=53 & obsindex !=69 & obsindex !=79 & obsindex !=88 & obsindex !=93 & obsindex !=102 & obsindex!=103 & obsindex !=109 & obsindex !=110 & obsindex !=111 & obsindex !=112 & obsindex !=114 & obsindex !=115 & obsindex !=116 &obsindex !=119, vce(robust)

